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Across the sciences, the statistical analysis of networks is central to the production of knowledge 
on relational phenomena. Because of their ability to model the structural generation of networks, 
exponential random graph models are a ubiquitous means of analysis. However, they are limited by 
an inability to model networks with valued edges. We solve this problem by introducing a class of 
generalized exponential random graph models capable of modeling networks whose edges are valued, 
thus greatly expanding the scope of networks applied researchers can subject to statistical analysis. 
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The need to analyze networks statistically transcends 
disciplines that have occasion to study the relationships 
between units. Applications in physics [IH5], computer 
science the social sciences [3 H], and other fields ex- 
amine networks that vary in size and density, over time, 
and have edges with values that vary from binary ties, 
to counts, to bounded continuous and unbounded con- 
tinuous edges. An important method for statistical infer- 
ence on networks is the exponential random graph model 
(ERGM) [9HTT] , which estimates the probability of an ob- 
served network conditional on a vector of network statis- 
tics that capture the generative structures in the network. 
Yet the ERGM has a major limitation: it is only defined 
for networks with binary ties[12J [13], thus excluding a 
wide range of networks with valued edges (e.g., gene co- 
expression networks, passage time on networks of various 
media, monetary transactions, casualties in conflict net- 
works). 

We develop a class of generalized ERGMs (GERGMs) 
for inference on networks with continuous edge values, 
thus lifting the restriction of this methodology to a, pos- 
sibly small, subset of networks. The form of our gen- 
eralized model is similar to the ERGM in that it can be 
flexibly specified to cover a broad range of generative fea- 
tures. The GERGM can be estimated efficiently with a 
Gibbs sampler. 

The strengths and limitations of the ERGM are ap- 
parent from its specification. Let Y be the n-vertex 
network (adjacency matrix) of interest with m edges 
(m = n(n — 1) if y is directed and n(n — l)/2 if it is 
undirected). Yij is the edge from i to j. An ERGM of 
that network is specified as: 



v{Y,e) = 



exp{e'h{Y)} 



E 



all Y'ey 



cxp{6l'h(F*)}' 



(1) 



where ^ is a parameter vector, h.{Y) is a vector of statis- 
tics on the network, and the object of inference is the 
probability of the observed network among all possible 
permutations of the network given the network statis- 
tics. The h{Y) term is what gives the ERGM much of 
its power: this vector can contain statistics to capture 



the endogenous structure of connectivity in the network 
(statistics can be included to capture reciprocity, transi- 
tivity, cyclicality, and a wide variety of other endogenous 
structures) as well as the effects of exogenous covariates. 

The challenges for modeling networks with valued 
edges are apparent from the specification in equation [T] 
The flexibility of the distribution comes from the lack of 
constraints in specifying h; the only constraint is that 
h is finite when evaluated on any binary network. This 
assures that the denominator is a convergent sum, and 
therefore represents a proper normalizing constant for 
the distribution of networks. However, this convergence 
is not assured whenever h is finite if the support of Y 
is infinite. The model we derive retains the fiexibility of 
h within a framework that assures a proper probability 
distribution for Y when Y has continuous edges. 

Our generalized ERGM operates by constructing joint 
continuous distributions on networks that permit the 
representation of dependence features among the ele- 
ments of Y through a set of statistics on the network, 
h(F). As in the ERGM, the vector h can be specified 
to represent many forms of dependence, including tran- 
sitivity (i.e., clustering), cycling, and reciprocity; an im- 
portant attribute of the model because such dependence 
features characterize valued networks [13J. 

There are two specification steps in our approach to 
GERGMs: first, we specify a tractable joint distribu- 
tion that captures the dependencies of interest on a re- 
stricted network, X, and then we transform X onto the 
support of Y; thus producing a probability model for Y. 
To illustrate these steps, begin with consideration of the 
restricted valued network X e [0, 1]™, where m is the 
number of edges. 

In our first specification step, h is formulated to repre- 
sent joint features of Y in the distribution of X: 



fx{x,e) = 



cxp [e'h{x)] 



/[o,]„exp[e'h(Z)]dZ' 



(2) 



where 6 e W is the parameter vector, h : [0, 1]™ M^, 
h is finite on [0, 1]™ and hi{-) are the sums of subgraph 

products such that for every z, ^^2^^"^'* = 0. This is a flexi- 





ble specification because many dependence relationships 
can be captured by summing products over subgraphs 
of the network, particularly when the edges are in the 
unit interval [IIT. For instance, networks generated by a 
highly reciprocal process are likely to exhibit high values 
of '^i^j XijXji, and those in which connections gravi- 
tate toward high-degree vertices exhibit high values of 
J2iJ2j.k^i^3^^ki (i.e., "two-stars" fn\). An important 
property of fx is that when 9 — 0, X is a network of 
independent uniform random variables. 

In our second specification step, we apply parame- 
terized, one-to-one, monotone increasing transformations 
{G~^{-)) to the m edges of the restricted network, thus 
transforming the restricted network X onto the support 
of the network of interest Y. Yij = G~j^{Xij,Xij), where 
Xij parameterizes the transformation to capture marginal 
features of Yij. Because dG~^{Xij,\i)/dXij > 0, the 
properties of multivariate transformations[15 imply that 
the distribution of Y is /yCr,^, A) = fx{G{Y,A),9)\J\, 
where the Jacobian matrix, J, is the matrix of first par- 
tial derivatives. Since J is a diagonal matrix, we may 
write the GERGM as 



fY{Y,e,A) 



cxp[g'h(G(y,A))] 
/[o^,]„exp[0'h(Z)]dZ 



(3) 



A useful way to specify g is as a probability density 
function (i.e., G is a CDF, and G~^ an inverse CDF) 
parameterized to match the support of Y and capture 
features of Y such as location, scale, and dependence 
on covariates. This approach to specifying g has the ele- 
gant feature that the distribution contains many common 
models for independent and identically distributed vari- 
ables as special cases when ^ = 0. For instance, if ^ is a 
Gaussian PDF with constant variance and the mean de- 
pendent on a vector of covariates, the model reduces to 
that assumed in least squares regression. The GERGM 
also allows hypothesis tests for block restrictions (i.e., 
likelihood ratio or Wald tests) to test the assumption 
that the edges of Y are independent conditional upon A. 

There are two ways to interpret dependence model- 
ing of Y via X. First, following !13J, who derive an 
ERGM-like model for a network with discrete edges on 
the unit interval, X can be interpreted as a standardized 
relational intensity network. Second, and more directly, 
when g is a PDF, X is the random variable drawn from 
the joint distribution of the quantiles of F. Therefore, the 
vectors h and characterize the dependencies among the 
quantiles of Y. The latter interpretation closely resem- 
bles the process of constructing joint distributions with 
copula functions |16l I17j . A simple example of deriving 
a joint distribution through the combination of h and g 
is illustrated in figure [l] which presents the distributions 
of X and Y for a directed network with two vertices ex- 
hibiting a high degree of reciprocity. 

Estimation of the parameters in the model is a non- 
trivial task. The greatest challenge in estimating d and 
A in equation [3] is that the integral in the denomina- 
tor is typically intractable. Because of the polynomial 
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FIG. 1. Bivariate distributions for edges in a two-vertex di- 
graph. The darker the shading, the higher the relative like- 
lihood of a point. In this example, g is the standard nor- 
mal PDF, and fx is defined by h = {X12 + .X'21, X12X21}, 
and = {—3.5, 7}, representing negative density and positive 
reciprocity effects. 



structure of h, and the fact that the variables of inte- 
gration are bounded, we know that the integral is both 
positive and finite, meaning fy is a proper joint distri- 
bution. However, inference requires the approximation 
of the denominator. 

In order to approximate the denominator in equation 
[3j we sample from fx using a Gibbs Sampler. To do so, 
we require the conditional distribution of Xij\X^ij. To 
simplify the notation, let /j^ -^j,„ exp [0'h(Z)] dZ — G{6). 

The conditional distribution [fx) is given by 



exp 



XijO 



/ dhjX) 



(4) 



We may then draw from the conditional distribution in 
equation |4] using the inverse CDF method. If u is a uni- 
form (0,1) random variable, then 



In 



-^ij I A— ij 



1 + u [ exp 



a/ dh{X) 
" dXij 



w dh(X) 
" dXij 



(5) 



When 6' — the conditional density given in equa- 

tion |4] is undefined. However, in this case, each point 
in the unit interval is equally likely and the conditional 
distribution of Xij is uniform(0,l). 
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In order to estimate 6 and A, we maximize In [/y]: 



(a) Regression Estimates 



e'h{G{Y,A)) 



\n[C{e)\. (6) 



Our algorithm iteratively proceeds by maximum likeli- 
hood (ML) estimation of h.\0 and Markov chain Monte 
Carlo maximum likelihood estimation (MCMC-MLE) of 
9\K until convergence. We derive an approximation to 
the asymptotic variance-covariance matrix by the inverse 
of the negative Hessian matrix at the last iteration. 

The estimation of A|0 is straightforward. Because C{6) 
does not depend on A, ML estimation of A|^ reduces to 



arg max 
A 



0'h(G(y,A)) 



(7) 



a function easy to maximize using a hill-climbing algo- 
rithm. ^ 

The estimation of ^|A is more involved. Let X ~ 
G(y, A) be the estimate of the intensity/quantilc network 
given the current estimate of the transformation param- 
eters. The second term in equation [6] does not depend on 
9, so to estimate 9\K we find 



arg max 
e 



(9'h{X) - In [C{9)] 



(8) 



which requires an approximation of C{9). We approxi- 
mate C{d) using MCMC-MLE; an iterative method it- 
self. Let be the previous estimate of 6, and X be 
a sample of n networks drawn from fx{X,9^^~^^). Then, 
an approximation to C{9) is given by 



» exp 0'h(X/ 

c{9) = c(#['-ii) y — ^ : 

~tcxp 6>'['-ilh(lj) 



(9) 



This requires a starting value for 9. In simulation ex- 
periments, we have found the pseudolikelihood estimate 

(argmaxg {^ij In [fx{Xij\9)]^ ) to be effective in provid- 
ing starting values for 9 (i.e., ^'"l). 

We illustrate important features of the GERGM and 
demonstrate its efficacy by applying it to a real-world 
network: domestic migration in the United States [TSlll9j. 
We model changes in the directional migration fiows be- 
tween the 50 United States (as well as Washington D.C. 
and Puerto Rico) between 2006 and 2007. Yij is the dif- 
ference between the number of people who migrated from 
state i to state j in 2007 and the number who migrated 
from i to j in 2006. These data allow us to consider 
the GERGM in the context of a valued network requir- 
ing transformation away from an intensity network onto 
a continuous unbounded support with exogenous covari- 
ates and endogenous parameters, thus making full use of 
the GERGM 's flexibility. We use the Cauchy distribu- 
tion as our g function because its thick tails capture the 
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FIG. 2. Estimates of the parameters: bars span 95% confi- 
dence intervals. 5,000 draws for three iterations used in the 
MCMC-MLE 



high empirical kurtosis (637) of the network [20]. Thus, 
in the case where the edges of the network are indepen- 
dent conditional on the covariates, this specification re- 
duces to a generalized linear model (GLM) |5T] with a 
Cauchy link function. Because previous work on inter- 
state migration :22 suggests that population, unemploy- 
ment, per-capita income, and mean January tempera- 
ture of both the sending and receiving states are signif- 
icant determinants of migration, we include the change 
in each of these variables from 2005 to 2006 as covari- 
ates in our GERGM. We complete our specification by 
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(a) Cycles 



(b) Dyadic Reciprocation 
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FIG. 3. Reciprocal Feature Prediction: The boxplots repre- 
sent the respective dependence statistic computed on 1,000 
instances of the latent intensity network drawn from each 
model. Let X be the respective estimate of the intensity net- 
work obtained as the CDF evaluated at the transormation pa- 
rameters (A) for the GERGM and Cauchy GLM. Then cycles 
XijXjhXki + XikXkjXji, and dyad reciproca- 



(a) is Y^i^j^^-^ij^jk 
tion (b) is Y.i<j ^a^n- 



Horizontal grey bars are placed at 
the statistic computed on the estimated intensity network. 



including endogenous dependence terms for clustering, 
dyadic reciprocity, generalized reciprocity (i.e., cycling - 
the degree to which change in flows to and from a state 
are correlated|23j). state level attraction, and state level 
repellence. 

Figure [2] shows the estimates from our GERGM as well 
as estimates from a Cauchy GLM. A Wald test suggests 
the restriction of the dependence terms to zero in the 
regression model is inappropriate and that the GERGM 
provides a better fit to the data (Wald statistic = 119.19 
on 5 degrees of freedom, statistically significant at the 
0.001 level). The statistically significant effects for the 
network parameters indicate that (a) there are clustering 



effects in the network, (b) migration to states repels fur- 
ther migration, and (c) increases in migration flows from 
a state are not offset by increases in flows to that state. 
We also find a decrease in the number of people leaving 
warm states, a decrease in migration to states that expe- 
rienced a substantial increase in population in the previ- 
ous year, and evidence of an increase in migration away 
from states experiencing increases in unemployment. 

The superior performance of the GERGM relative to 
the Cauchy regression is further depicted in figure |3) 
which gives the predicted and observed network-level 
reciprocity and cycling measures from the GERGM and 
Cauchy GLM. This figure shows that the regression does 
not adequately fit the lack of reciprocity in the migration 
network. Theoretically, it is expected that a network of 
change in migration would exhibit anti-reciprocity and 
anti-cycling. If a locale is experienceing a spike in mi- 
gration to other places, that is likely indicative of some 
undesireable feature of said locale. This anti-reciprocal 
feature of the migration network cannot be integrated 
into the conventional regression modeling framework. 

Our GERGM model greatly expands the scope of net- 
works which can be modeled within the ERGM frame- 
work. We used this technology to analyze a real-world 
network and produce insights that could not be produced 
without the GERGM. Our general model represents a 
major advance in the statistical analysis of networks, and 
we expect it to become a common tool in disciplines span- 
ning the sciences. 
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